3 research outputs found
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection
Video anomaly detection (VAD) without human monitoring is a complex computer
vision task that can have a positive impact on society if implemented
successfully. While recent advances have made significant progress in solving
this task, most existing approaches overlook a critical real-world concern:
privacy. With the increasing popularity of artificial intelligence
technologies, it becomes crucial to implement proper AI ethics into their
development. Privacy leakage in VAD allows models to pick up and amplify
unnecessary biases related to people's personal information, which may lead to
undesirable decision making. In this paper, we propose TeD-SPAD, a
privacy-aware video anomaly detection framework that destroys visual private
information in a self-supervised manner. In particular, we propose the use of a
temporally-distinct triplet loss to promote temporally discriminative features,
which complements current weakly-supervised VAD methods. Using TeD-SPAD, we
achieve a positive trade-off between privacy protection and utility anomaly
detection performance on three popular weakly supervised VAD datasets:
UCF-Crime, XD-Violence, and ShanghaiTech. Our proposed anonymization model
reduces private attribute prediction by 32.25% while only reducing frame-level
ROC AUC on the UCF-Crime anomaly detection dataset by 3.69%. Project Page:
https://joefioresi718.github.io/TeD-SPAD_webpage/Comment: ICCV 202
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Semi-Supervised Learning can be more beneficial for the video domain compared
to images because of its higher annotation cost and dimensionality. Besides,
any video understanding task requires reasoning over both spatial and temporal
dimensions. In order to learn both the static and motion related features for
the semi-supervised action recognition task, existing methods rely on hard
input inductive biases like using two-modalities (RGB and Optical-flow) or
two-stream of different playback rates. Instead of utilizing unlabeled videos
through diverse input streams, we rely on self-supervised video
representations, particularly, we utilize temporally-invariant and
temporally-distinctive representations. We observe that these representations
complement each other depending on the nature of the action. Based on this
observation, we propose a student-teacher semi-supervised learning framework,
TimeBalance, where we distill the knowledge from a temporally-invariant and a
temporally-distinctive teacher. Depending on the nature of the unlabeled video,
we dynamically combine the knowledge of these two teachers based on a novel
temporal similarity-based reweighting scheme. Our method achieves
state-of-the-art performance on three action recognition benchmarks: UCF101,
HMDB51, and Kinetics400. Code: https://github.com/DAVEISHAN/TimeBalanceComment: CVPR-202